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Abstract 

We discuss methods that allow to increase the step-size in a parallel tempering simulation of 
statistical models and test them at the example of the three-dimensional Heisenberg spin glass. 
We find an overall speed-up of about two for contemporary lattices. 
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I. INTRODUCTION 



The Monte Carlo simulation of statistical models with a rugged free energy landscape 
is notoriously difficult. At low temperatures the simulation might get stuck in one of the 
valleys of the free energy, which leads to very large auto-correlation times. One way to 
overcome this problem is the so called parallel tempering algorithm, also called random 
exchange method or multiple Markov-chain method jl|, 0] • 

In a parallel tempering simulation, N replicas of the statistical system are simulated 
in parallel at different temperatures. From the low temperature which is the target of 
the investigation, a tower of intermediate temperatures is built up to a point, where the 
configurations can move easily from one valley of the free energy to an other. Let us denote 
the inverse of these temperatures by j5\ < (3 2 < ... < (5n- 

The parallel tempering method involves two components. One is the update of each indi- 
vidual replica, independently of the others, using a standard algorithm, e.g. local Metropolis 
updates. The other are updates that allow to swap two configurations between neighboring 
temperatures. With this step, configurations can travel from low to high temperatures and 
back and thereby bypass the barriers at low temperatures. 

In most implementations, this is realized by proposing the exchange of the field configu- 
rations at inverse temperatures (3i and A+i. This is accepted with the probability 

A;wap = min[l, exp((/3 m - 0i)(H i+1 - Hi))} , (1) 

where H i+ i and Hi are the values of the Hamiltonian of the replica at f3i + \ and 0i, respec- 
tively. For most systems, this is an inexpensive step, however, if the temperatures are chosen 
too far apart, the acceptance rate will very quickly drop to zero. Therefore, the /3, have to 
be chosen close enough such that the acceptance rate (A swap ) > 0.1. In particular for large 
systems, this can require the use of large numbers of intermediate temperatures, at each of 
which a replica has to be simulated. 

Here we discuss modifications of the replica exchange step of the algorithm. The basic 
idea is to not simply swap the two replica, but to modify them in such a way as to improve 
the probability that the move is accepted. These modifications lead to a higher acceptance 
rate at given differences — or the same acceptance rates can be achieved using larger 
differences in the inverse temperature. 

We like to mention that in 0, Q annealed swapping is used to this end. The authors 
of jjjj], who have simulated the two-dimensional XY model on the square lattice, could 
indeed reach larger steps in the temperature, however, this progress does not compensate 
for the additional effort needed for the auxiliary update steps. The authors of [4| have 
performed a molecular dynamics simulation of 10 classical non-interacting particles in a 
potential with several local minima. They find that annealed swapping "is able to achieve 
the computational efficiency of ordinary replica exchange, using fewer replicas." 

In this work, we shall discuss these methods for the example of the three-dimensional 
Heisenberg spin glass, however, they are quite general and can be easily adapted to other 
models. The Monte Carlo simulation of spin glasses in general is quite challenging. For 
the three-dimensional Ising spin glass at the transition temperature and at temperatures 
below, only lattices up to 32 3 have been simulated [5}. While there is consensus that the 
model undergoes a second order phase transition, the error bars of critical exponents are 
quite large. 

In this letter, we discuss the case of the physically more realistic Heisenberg spin glass, 
where even the nature of the phase transition is still under debate. Recent works are {!, 0]. 
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We consider the Heisenberg spin glass on a simple cubic lattice with periodic boundary 
conditions. The classical Hamiltonian is given by 

H ^ J xy $x ' &y i (2) 

<xy> 

where the field variables s x are unit vectors with three real components; x and y denote the 
sites of the lattice. The summation runs over pairs <xy> of nearest neighbor sites. The J xy 
are nearest neighbor interactions with a Gaussian distribution of zero mean and standard 
deviation unity. For each set of these quenched interactions, the expectation values of the 
observables are computed and then averaged over different realizations of the J xy . 



II. OBSERVABLES 



In order to study the performance of the algorithm, we have measured the overlap sus- 
ceptibility, which is constructed from the overlap variable 



s (l) ,(2) 



(3) 



where si and are the fields at the site x of two statistically independent configurations 
{s*} 1 and {s } 2 . The overlap susceptibility is then given by 



X 



-y 



Furthermore, we have measured the internal energy defined by 



'xy $x ' Sy 



(4) 



(5) 



<xy> 



In a physics study of the model, one would consider a larger list of quantities, including 
e.g. the second moment correlation length and various cumulants. Also one might study so 
called chiral quantities; see Refs. 0, 0| for their definition. 

To get the two independent configurations required by Eq. ([3]), we simulated, as it is usu- 
ally done, two copies of the system for each of the temperatures. Alternatively one might 
simulate a single copy and store the configurations on disk. Then one can combine config- 
urations that are separated by t ^> r in the Markov chain, where r is the autocorrelation 
time. 



III. IMPROVED REPLICA EXCHANGE 

In the following two sections, we describe two methods, which improve on the traditional 
replica exchange between neighboring temperatures. The first is a decimation procedure, 
where we study the update under an effective action in which half the fields have been 
integrated out. It is described in Sec. IIII A[ In the second procedure, given in Sec. IIIIBl 
we apply an invertible cooling/heating transformation of the fields, which leads also to a 
higher acceptance in the exchange step. For completeness, we give the details of the hybrid- 
overrelaxation algorithm used to simulated the individual replica between the exchange steps 
in Sec. IfflCl 
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A. Decimation 



The method described in this section is applicable to general models, where a sub-set of 
lattice points can be chosen such, that they do not interact among each other. Since the 
generalization to other models, e.g. the Ising spin glass, is trivial, we stay in our discussion 
with the Heisenberg spin glass model on the simple cubic lattice. The partition function in 
terms of the action 

s\p, {s}} = /3H[{s}] = -pJ2 J *y g * • g y ( 6 ) 

<xy> 



is given by 



n 



ds- 



exp(-£[/?,{s}]) 



(7) 



The simple cubic lattice can be divided into two sub-sets of points, one called white (W), 
the other black (B), such that the black points have only white neighbors and vice versa. 
This allows us to write the partition function as 



ds. 



n 

xEB 



ds. 



exp(-S[/?,{s}]) 



(8) 



where now all the integrations over the fields on the black sites can be performed: 

n / ^ exp(-s[/?, {s}]) = n / d ** e MP s x -s x ) =n hp 



(9) 



where 



xy Sy 



(10) 



y.nn.x 



is the sum over the fields on the nearest neighbor sites of x. In Eq. we have introduced 
the abbreviation 

Sx) = j d4exp(/3 s x ■ S x ) . (11) 
In the case of the Heisenberg model, this integral can be easily performed 

J ds 1>x j ds 2 , x J ds 3>x 8(s^-l) exp(R x s 1>x ) = c J ds hx exp(R x s l!X ) 



2c 
Rx 



smh(R x ) 



(12) 



where we have rotated the problem such that S x is a multiple of (1,0,0) and R x = {3\S X \. 
Putting everything together, the partition function reads 



Z=H ! ds x exp(-S[/?,{sVD with S[(3 J {s} w } = -J2^I((3 S 
x&w -I xeB 



(13) 



where the subscript W indicates that S depends only on the fields on the white sites. 
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Now we perform a tempering step with the fields on the white sites only, using the action 
S(fa {s }w)- Given a field {s }\ ¥ at fa and a field {s }^ at fa, the proposal is to swap to 
{s }\ v at fa and {s }\ v at fa. The acceptance probability for this swap is given by 



A swap = min 



TlxeB H^sl 1 ^) Yl xeB i(fasi 2 ^ 



(14) 



In the case of the Heisenberg model, the evaluation of this expression is relatively sim- 
ple. First we notice that the prefactors jjr cancel. Therefore it remains to evaluate 
Y[ xeB smh(R x ) , for which details are given in Appendix lAl 

In principle, one could also perform the updates using the decimated action S(fa {s }w)- 
However there is no efficient update for this action. The best idea might be to perform 
local Metropolis updates of {s }w- This would require to evaluate AS (fa {s }w)i which is 
relatively expensive. 

Instead, after the tempering step, we insert the fields on the black sites again, using 
their (local) Boltzmann weight. Technically, this is done in exactly the same way as a 
heatbath update is performed. Having restored the fields on the black sites, we can perform 
overrelaxation and heatbath sweeps as usual. 

In our simulations, in order to save CPU-time, we only insert new fields on the black sites 
if the swap is accepted, otherwise the fields keep their old values. Furthermore, we alternate 
the role of black and white sites from one pair of fa values to the next. In the case of the 
first pair, we chose randomly whether the fields on black or white sites are decimated. 



B. Cooling and Heating 

Let us now turn to the second idea to improve the replica exchange step. Inspired 
by the field transformations proposed in the framework of the Hybrid Monte Carlo al- 
gorithm it improves on the standard step, which exchanges just the configuration 
({^*} 1 '{^*} 2 ) ~~ ({^} 2 , {s} 1 ) evaluating the action at the respective other parameters by 
applying an invertible field transformation to the configurations 

m\{?} 2 ) ^ m^n^m 1 )) ■ 

This can be successful, if we manage to find a transformation, which transforms a "typical" 
configuration from temperature No. 1 into one more like those at temperature No. 2 and 
vice versa. Obviously this update is reversible. For the acceptance probability one has to 
take the Jacobian determinant det Jj-(-) of the transformation into account. For a general 
transformation we can then use the acceptance probability 



A swap = min 



det JA{s} 2 ) exp {-faHjT({s}*)} - faH\^-\{sY)]} 
' det J F ({s Y) exp {-faH[{sY\ - faH[({sY)]} 



(15) 



For most transformations, computing the Jacobian is a very cumbersome task, it is there- 
fore advisable to use a transformation, which is composed of elementary steps /, which only 
manipulate one field variable at a time and only depend on its nearest neighbors. Then the 
Jacobian matrix ds'/ds, where s' = f(s), can be easily computed along with its determinant. 
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For the Heisenberg spin glass, we propose a transformation which is cooling the configu- 
ration when moving towards a lower temperature and heating it up when moving towards a 
higher one. The specific / we tested here is given by 

->/ - , P ■ 

s~ = s x cos a + T—7 sin a 

\P\ 

— * 

with a — e\p\ where e is a tunable parameter, p is the vector in the s x -S x plane orthogonal 

to s x 

P = S x (S x ■ s x ) s x 

with S x as defined in Eq. (TIP]) . In case of cooling, it reduces the angle between the S x and 
s x by a. For the inverse operation, a non-linear equation has to be solved. This can be 
achieved by a simple Newton iteration which converges very quickly. 

In order to compute the Jacobian of the transformation /, we rotate the coordinate 
system of the integration over s x such that the z-axis is parallel to S x . Since only the angle 
9 between s x and S x is altered, the relevant integration measure is dcos^. For the angle 
after the cooling 9' = 9 — a, we therefore have 

(cos a — cot 9 sin a) = dcos#|l — eS x ■ s^|(cosa — cot 9 sin a) 

and we get for the Jacobian determinant 

det Jj{s x ) — |1 — eS x ■ s^|(cosa — cot 9 sin a) , 

which has to be accumulated multiplicatively over all steps of the cooling/heating in order 
to get the aggregated value for the whole sweep. 

One might expect that the optimal value of the parameter e depends on the pair of 
temperatures. However, to keep things simple, we have used the same value of e for all of 
them. Since (3 i+ i/(3i decreases with increasing lattice size L, also the optimal value of e is 
decreasing with increasing L. In order to tune e we have monitored the acceptance rate 
A. Reasonable estimates of A can already be obtained from rather short runs; Here we 
performed runs with 10000 cycles each. The optimal value does not depend strongly on the 
particular set of coupling constants. 

C. Heat-bath and overrelaxation updates 

In an elementary step of the algorithm, the field at a single site of the lattice is updated. 
Using these updates, we sweep through the lattice in typewriter fashion. To this end we 
use heat-bath and overrelaxation updates: In the case of the heat-bath update, we chose 
the component of the new field that is parallel to the nearest neighbor sum S x , defined in 
Eq. (HUD, as 

= \n{z + {z- 1 -Z)r)/\PS X \ , (16) 

where z = exp(—/3\S x \) and r is a random number that is uniformly distributed in [0,1]. 
The two orthogonal components 

s M) = y / 1 _( s W)2 sin j 4°' 2 ) = ^/l-(4 p) ) 2 cos0 (17) 



d cos 9' = d cos 9 



1 - 



da 
d9 
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where is uniformly distributed in [0, 2tt]. In an elementary overrelaxation update the field 
s x is replaced by 

s> = 2^S X - s x . (18) 

The overrelaxation update takes considerably less CPU-time than the heat-bath update, 
since it requires neither random numbers nor the evaluation of transcendental functions. The 
overrelaxation update by itself is not ergodic, since it keeps the energy constant. Therefore 
it has to be supplemented by Metropolis, or as it is the case here, heat-bath updates. It has 
been demonstrated that such a hybrid of heat-bath and overrelaxation updates is clearly 
more efficient than heat-bath updates alone. For a discussion see for example section IV of 

0. 



IV. NUMERICAL RESULTS 

In order to test the performance of the algorithm, we have performed simulations for 
L = 16, 24 and 32. Setting up the simulation, we closely follow Ref. [s]: In the tempering 
algorithm, we simulate temperatures T = 1/(3 from T min = 0.12 up to T max = 0.19. The 
intermediate temperatures are given by 

rp \ (i-l)/(JV T -l) 



- max 



Ti = T max ( ) (19) 

J- rr 



where i = 1,2, Nt, with Nt = 15, 27 and 43 is used for L = 16, 24 and 32, respectively. 
For each tempering update, we perform a heat-bath sweep followed by |L overrelaxation 
sweeps, again as in Ref. [s} . We have not used improvements of this strategy studied in 
Refs. , since they are complementary to the ones discussed here. In our implementa- 

tion, a heat-bath sweep takes about 5 times more CPU-time than an overrelaxation sweep. 
In the case of the standard tempering update the CPU-time needed is small compared with 
that needed for a heat-bath sweep. For the decimation, the tempering update takes a little 
less CPU-time than a heat-bath sweep, while for the cooling/heating method the tempering 
update takes about twice the CPU-time of a heat-bath sweep. In particular in the case of 
the cooling/heating method, we did not spent much time on optimizing our implementation. 
In both cases, the CPU-time taken by the tempering update is still clearly smaller than that 
required by the total of the heat-bath and the overrelaxation sweeps. 

In the following, we will use the acceptance rate of the replica exchange step, the round 
trip time of the replicas and the auto-correlation time of the overlap susceptibility as figures 
of merit for the performance of the algorithm. Since they might depend strongly on the 
particular set of couplings {Jij}, it is quite important to test all variants of the parallel 
tempering algorithm on the same sets. For each of the lattices sizes, we have therefore 
generated ten realizations of the {Jij}, on which we performed our tests. 

In the case of the standard and the decimation tempering, we have performed 100000, 
200000 and 500000 update cycles for each {Jij} for L = 16, 24 and 32, respectively. For 
the cooling/heating method we have performed 500000 update cycles throughout. These 
numbers are clearly larger than the number of update cycles required for equilibration. In 
the following, we use e = 0.017 for L — 16 and e = 0.01 for L = 24. 
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L 


standard 


decimation 


cool/heat 


16 


0.114 — 0.122 


0.237 — 0.256 


0.290 — 0.249 


24 


0.116 — 0.126 


0.239 — 0.262 


0.240 — 0.320 


32 


0.134 — 0.145 


0.262 — 0.284 





TABLE I. We give the acceptance rates for the standard and our two improved tempering methods 
as a function of the lattice size. The acceptance rates depend mildly on the pair of j3- values of the 
swap. In all cases it is a monotonic function of f3. In the table we give the acceptance rate for the 
pairs with the smallest and largest values of /3. 

A. Acceptance rates of the tempering method 

In table [J we have summarized our results for the acceptance rates of the tempering 
update. We find that for the given choice of 0i, the acceptance rates can approximately 
be doubled by our improved tempering methods. In almost all cases, the acceptance rate 
slightly increases with increasing /3; only in the case of the cooling/heating method there is 
a decrease for L = 16. It seems that our choice of the parameter e of the cooling/heating 
procedure for L = 16 is better suited for small values of /3 than for large ones. For L = 24, 
the situation is just the opposite. It seems to be beneficial, to tune e for each pair of j3 
values separately. 

In the case of L = 24, we tried to determine by how much we can reduce Nj< in the case of 
the improved tempering, here only considering decimation. For two of the coupling sets, we 
performed runs using Nt = 22 instead of Nt = 27. We find acceptance rates of A m 0.146 
up to A ~ 0.163, i.e. still a bit larger than with the standard tempering and Nt = 27. 

B. Round trip and autocorrelation times 

Our next task is to determine, whether these increased acceptance rates lead to smaller 
autocorrelation times. To this end, we have studied the round trip time and the integrated 
auto correlation times of the overlap susceptibility at the lowest temperature T m i n = 0.12. 

The round trip time is defined in the following way: we count how often a configuration 
runs from the highest temperature T max to the lowest temperature T min and back to T max . 
This is done for all 2Nt configurations. The round trip time t R is then given as the number 
of all sweeps divided by the number of round trips. 

In the case of L = 16, the round-trip times for the standard algorithm range from t R = 
2495(46) up to 2942(53). These times are reduced to t R = 1147(12) up to 1467(19) by the 
decimation method. We have computed the speed-up factor ^standard Ari.decimation for each 
of the ten coupling sets. The average is 2.1. The round-trip times for the cooling/heating 
procedure give t R = 901(4) to t R = 1170(6), which gives comparable speed-ups of 2.6. 

For L = 24, the round-trip times for the standard algorithm range from t R = 9332(209) 
up to 15485(593). For the decimation, we get t R = 4369(87) up to 8100(165). The average 
speed-up is 1.9. The cooling/heating gives round-trip times of t R = 4038(52) to 7437(152) 
and a speed-up of 2.0. 

For L = 32, the round-trip times for the standard algorithm range from t R = 27451(374) 
up to 46206(1452). For the improved tempering we get t R = 16356(131) up to 27432(843). 
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acc. 


rate 




round trip 




7> + (S) 


L 


decim. 


cool/heat 


decim. 


cool/heat 


decim. 


cool/heat 


16 


2.09 


2.42 


2.1 


2.6 


2.3 


2.5 


24 


2.07 


2.33 


1.9 


2.0 


2.3 


2.5 


32 


1.96 




1.7 




2.4 





TABLE II. Speed-up for the decimation and the cooling/heating procedure with respect to the 
standard parallel tempering, given in terms of the acceptance rate of the exchange step, the round- 
trip time and the auto-correlation time of the overlap susceptibility. The error on these numbers 
is around 2 on the last digit. 



With an average speed-up is 1.7. 

The statistical errors that we quote above are only rough estimates, since they are ob- 
tained by blocking the whole data set in ten sub-ensembles. The fluctuations between differ- 
ent coupling sets are larger than these individual errors. Fortunately, the relative speed-ups 
fluctuate only mildly and can therefore be determined to about 10% accuracy. 

Next we have computed integrated autocorrelation times defined by 



Tint 

Z 

t=l 



where p(t) is the normalized autocorrelation function defined by 

(0(i)Q(i + t))-(Q)* 
Pit) = W) _ W (21) 

and the upper end of the summation is chosen self-consistently as tf — cr int . Since the 
integrated autocorrelation times of the overlap susceptibility are larger than those of the 
internal energy, we shall only discuss the former in the following. We have computed v int for 
the three choices c = 4, 6 and 10. The extracted auto-correlation times differ considerably 
due to rather long tails in the auto-correlation functions. The variation of the integrated 
autocorrelation time over the different coupling sets is similar to that of the round-trip times. 
On the other hand, the relative speed-up in the autocorrelation times that is obtained by 
the improved tempering methods depends only little of the parameter c and the coupling 
set. In table [TT] we give the speed-ups obtained with c = 4. The improvement found here is 
somewhat larger than that seen in the round trip times. 



C. Equilibration 

The equilibration time is an important quantity in spin glass simulations, because in order 
to perform the averages over different coupling sets, many ensembles have to be simulated, in 
particular since the variation of interesting observables turns out to be large. Unfortunately, 
we were not able to systematically study the equilibration. To get an impression, we have 
focussed on the coupling set 5 for L = 16. This coupling set has the largest round-trip time 
among the 10 sets that we have studied. 
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FIG. 1. (Color online) We plot the overlap susceptibility x at T = 0.12 and L = 16 with the 
coupling set 5 as a function of the Monte Carlo time. We have averaged over 2000 independent runs. 
All runs were started with s x = (1,0,0). The straight lines give our estimate for the expectation 
value of x an< 3 its statistical error, obtained from the simulations discussed in the previous section. 
The equilibrium value is approached faster using the improved tempering than using the standard 
one. 



We did the first 1000 iterations of the update 2000 times with different random seeds. 
All these 2000 simulations are started with s x = (1,0,0). In figured] we give the averages 
of the overlap susceptibility for T = 0.12 as a function of the Monte Carlo time. We see a 
clear speed-up comparing the standard tempering simulation with the improved tempering 
one, e.g. the value 400 is reached at t ~ 230 in the case of the improved simulation, while 
in the case of the standard simulation this is the case for t ~ 370. These 2000 simulations 
did cost about 4 days of CPU time. Therefore we abstained from redoing such simulations 
for all our coupling sets and for larger values of L. 



V. SUMMARY AND CONCLUSIONS 



In this letter, we discuss two methods to improve on the replica exchange step of parallel 
tempering. The key idea in both is not to leave the configurations as they are and try to 
swap them between temperatures, but to transform them during this step. In both methods, 
we find an improvement of the step by roughly a factor of two. 
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Appendix A: Evaluation of Eq. (|14l) 

Here we give details of the fast numerical evaluation of rj xgB sinh(i? x ), which appears in 
Eq. (114j) . First we write 



and proceed, depending on the value of R x in the following way: For R x <J 17.5 , within 
the numerical precision of double precision numbers ln[l — exp(— 2R X )\ = 0. For 1 < R x < 
17.5, we have used a pre-computed table, for R x = 0.9, 1.0, 17.5, 17.6 In order to get 
ln[l — exp(— 2R X )} for any R x in 1 < R x < 17.5, we have quadratically interpolated the 
entries of this table. We have checked that the error of this evaluation is at most of the 
order 10~ 16 . If R x < 1, which very rarely happens in the simulation, we have used the 
functions of the C-library to compute ln[l — exp(— 2R X )}. 
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In [2 smh(R x )} = R x + ln[l - exrj(-2R x )} 
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